Spectral analysis of communication networks using Dirichlet 

eigenvalues 

Alexander Tsiatas^, Iraj Saniee^, Onuttom Narayan^, and Matthew Andrews^ 

^Department of Computer Science and Engineering, University of California, San 
Diego, 9500 Gilman Drive, La Jolla, CA 92093-0404 
^Mathematics of Networks, Bell Laboratories, Alcatel-Lucent, 600 Mountain 

Avenue, Murray Hill, NJ 07974 
^Department of Physics, University of California, Santa Cruz, CA 95064 



Abstract 

The spectral gap of the graph Laplacian with Dirichlet boundary conditions is com- 
puted for the graphs of several communication networks at the IP-layer, which are 
subgraphs of the much larger global IP-laycr network. Wc show that the Dirichlet spec- 
tral gap of these networks is substantially larger than the standard spectral gap and 
is likely to remain non-zero in the infinite graph limit. We first prove this result for 
finite regular trees, and show that the Dirichlet spectral gap in the infinite tree limit 
converges to the spectral gap of the infinite tree. We also perform Dirichlet spectral 
clustering on the IP-layer networks and show that it often yields cuts near the network 
core that create^ gcuiuinc^ single-component clusters. This is much better than traditional 
spectral clustering where several disjoint fragments near the periphery are liable to be 
misleadingly classified as a single cluster. Spectral clustering is often used to identify 
bottlenecks or c;ong{;stion; since congestion in these networks is known to peak at the 
core, our results suggest that Dirichlet spectral clustering may be better at finding 
bona-fide bottlenecks. 

1 Introduction 

Many real- world networks are truly vast, encompassing millions or billions of nodes and 
edges, e.g., social and biological networks. This scale produces computational challenges: 
the large majority of algorithms are too computationally intensive to use at this scale on 
general graphs. Instead, one can study smaller sub-graphs of these networks; for exam- 
ple, the portion of a social network corresponding to one university, or the portion of a 
communication network corresponding to one Internet service provider. 
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Spectral graph theory [Ij , the study of eigenvalues and eigenvectors of graph-theoretic 
matrices, is often used to analyze various graph properties. One might hope that the prop- 
erties of a large sub-graph of a network will be representative of the properties of the entire 
network. Unfortunately, the properties of an expander graph depend on the conditions 
imposed at its (large) boundary. In particular, the spectral gap of the graph Laplacian on 
a finite truncation of an infinite regular tree approaches zero as the size of the truncation 
is increased, even though the spectral gap of its infinite counterpart is non-zero. In this 
paper we show that, by contrast, if the spectral gap is calculated with Dirichlet boundary 
conditions, it approaches the infinite graph limit as the size of truncation is increased. 

Motivated by this result, we compute the Dirichlet spectral gap for ten IP-layer com- 
munication networks as measured and documented by previous researchers in the Rocketfuel 
database |18] . We find that the Dirichlet spectral gap is much larger than the traditional 
spectral gap for these graphs. (Traditional spectral clustering uses the normalized Laplacian 
matrix C or some similar matrix; we use the matrix Cd- the Laplacian restricted to the 
rows and columns corresponding to non-boundary nodes.) Moreover, unlike the traditional 
spectral gap, it does not trend downwards for larger networks. This indicates that the 
spectral gap for these networks viewed as sub-graphs of an infinite graph is finite. 

There are precedents for treating networks essentially as subsets of an an overarching 
infinite graph; many network generation models [21 |6l |2T] exhibit unique convergence prop- 
erties (to power-law degree distributions or otherwise) as the size of the network grows to 
infinity. We also note that Dirichlet boundary conditions have been shown to be successful 
at mitigating other boundary-related issues in graph vertex ranking [5]. 

There is a direct connection between the spectral gap and clustering in networks, 
through the Cheeger inequality. Spectral graph theory has led to many effective algorithms 
for finding cuts that result in a small Cheeger ratio, including spectral clustering |15 | ll7 t [T6] , 
[20] and local graph partitioning algorithms [1]. These algorithms have been well-studied, 
both empirically |15[ [T7| and theoretically |15| [20] . Unfortunately, these algorithms can 
also exhibit some undesirable behavior. It has been shown empirically [1^ that the "best" 
partitionings of many networks, as measured by the Cheeger ratio, result in cutting off nodes 
or subtrees near the boundary of the network. The resulting 'clusters' near the boundary 
actually consist of several disjoint fragments. Especially when viewed as subsets of larger 
networks, this kind of clustering is not particularly meaningful. 

In this paper, we use Dirichlet spectral clustering to identify good cuts in the net- 
works in the Rocketfuel database. We use the top two eigenvectors of Cd, the graph 
Laplacian with Dirichlet boundary conditions, to cut the network into two sections. We 
demonstrate that, compared to traditional spectral clustering, there is a substantial reduc- 
tion in the average number of components resulting from the cut, without a significant 
increase in the Cheeger ratio. Instead of finding cuts near the boundaries of the networks, 
Dirichlet spectral clustering obtains cuts in the network core. 

The Cheeger ratio of a cut is a well known indicator of the congestion across the 
cut; small Cheeger ratios are likely to be associated with bottlenecks. The emphasis on 
identifying core bottlenecks becomes more critical in the light of the recent observation that 
many real- world graphs exhibit large-scale curvature [101 [H] . It has been shown |14|, [TO] 
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that such global network curvature leads to core bottlenecks with load (or betweenness) 
asymptotically much worse than flat networks, where "load" means the the maximum total 
flow through a node assuming unit traffic between every node-pair along shortest paths 
|14j . As such, it is important to find and characterize bottlenecks at the core rather than 
the fringes, where they do not matter as much. Our observations, suggest that Dirichlet 
spectral clustering may be more useful in this regard. 

The rest of this paper is structured as follows: in Section [2j we give the theo- 
retical justification for using Dirichlet eigenvalues [3j instead of the traditional spectrum 
for analyzing and clustering finite portions of infinite graphs. In Section [3j we then com- 
pare the spectral gap using Dirichlet eigenvalues to the traditional spectral gap on real, 
publicly-determined network topologies [T8] that represent smaller portions of the wider 
telecommunications grid. In Section [4j we demonstrate how Dirichlet spectral clustering 
finds graph partitions that are more indicative of bottlenecks in the network core rather 
than the fringes. 



2 Spectrum of Finite Trees: Motivation for Dirichlet Spec- 
tral Clustering 

Throughout this paper, we analyze general undirected connected graphs G by using the 
normalized graph Laplacian £, defined as in [4]. For two vertices x and y, the corresponding 
matrix entry is: 



1 if X = y, 

— -J= if X and y are adjacent, and 
otherwise. 



where and dy are the degrees of x and y. We denote by A the spectral gap, which is 
simply the smallest nonzero eigenvalue of C 

For any graph G and finite subgraph S C G, the Cheeger ratio h{S) is a measure of 
the cut induced by S: 

h{S)- ""^^'^^ 



min(vol(S'), vol(S')) 

We use e(5, S) to denote the number of edges crossing from S to its complement, and the 
volume vol(5') is simply the sum of the degrees of all nodes in S. The Cheeger constant h is 
the minimum h{S) over all subsets S. The Cheeger constant and spectral gap are related 
by the following Cheeger inequality |1]: 

2h> X> —. 
- - 2 

Both A and h are often used to characterize expansion or bottlenecks in graphs. This 
inequality shows that they are both good candidates and gives the ability to estimate one 
based on the other. 
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For the infinite d-regular tree, the spectral gap and Cheeger constant have both 
been analytically determined [3 [13]. Using i2, the spectral gap is 

A = 1 - l^d^, (1) 

and the Cheeger constant is h = d — 2 [9]. Both of these values are nonzero, indicating good 
expansion. However, the Cheeger ratio for truncated d-regular trees (TdT) - those with 
all branches of the infinite tree cut off beyond some radius r from the center - approaches 
zero as the tree gets deeper. By cutting off any one subtree S from the root, there is only 
one edge connecting S to S, and as the tree gets deeper, this ratio gets arbitrarily small. 
Using the Cheeger inequality, it follows that the XtiIT — )• as r — )• oo. Thus, the standard 
spectral properties of finite trees do not approach the infinite case as they get larger; in fact, 
they suggest the opposite. This is problematic when making qualitative observations about 
networks and their expansion, necessitating another tool for spectral analysis of networks. 

The main reason why the traditional spectral gap does not capture expansion well 
in large, finite trees is the existence of a boundary. This is also problematic in network 
partitioning algorithms; often times the "best" partition is a bag of whiskers or combination 
of several smaller cuts near the boundary |12] . In this paper, we will use Dirichlet eigenvalues 
to eliminate this problem. 

Dirichlet eigenvalues are the eigenvalues of a truncated matrix, eliminating the rows 
and columns that are associated with nodes on the graph boundary. We will use a truncated 
normalized graph Laplacian, Co, a submatrix of C This is different from simply taking 
the Laplacian of an induced subgraph, as the edges leading to the boundary nodes are still 
taken into account; it is only the boundary nodes themselves that are ignored. We define 
the Dirichlet spectral gap to be the smallest eigenvalue of Co- 

Using Dirichlet eigenvalues, it is also possible to obtain a local Cheeger inequality [3] 
for the sub-graph S. First, the local Cheeger ratio is defined [3] for a set of nodes T C 5 as 

because the boundary nodes are excluded from S in the definition of [3J, the set T cannot 
contain any boundary nodes of S. The local Cheeger ratio H{T) is the appropriate quantity 
when iS is a sub-graph of a larger graph. The local Cheeger constant hs for S is then defined 
as the minimum of H{T) for all T C 5 \ d{S). The local Cheeger inequality obtained in [3] 
is 

hs>\s> -f, 

where Xs is the Dirichlet eigenvalue of the Laplacian restricted to the rows and columns cor- 
responding to nodes in S. This inequality indicates a relationship between local expansion 
and bottlenecks. 

The use of Dirichlet eigenvalues requires that the boundary of the graph S be defined. 
If S" is a tree, the leaf nodes are a natural choice. When S is actually a finite truncation 
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of a larger graph, the boundary can be defined as the set of nodes that connect directly to 
other nodes outside the truncation; for the Rocketfuel data [18], we will use the nodes with 
degree 1 which presumably connect outside of the subnetwork. 




Tree depth 



Figure 1: Dirichlet spectral gap for successively larger 3-regular trees, showing convergence 
to a nonzero value. 



We first use Dirichlet eigenvalues on d-regular trees as prototypical evidence for 
their effectiveness in capturing true spectral properties on real-world networks. There is 
empirical evidence in Figure [T| showing that the Dirichlet spectral gap for 3-regular trees 
indeed converges to a nonzero value as tree depth increases, contrasting with the traditional 
spectral gap which converges to zero. This is made rigorous in the following theorem: 

Theorem 1. For finite d-regular trees of depth L, the Dirichlet spectral gap converges to 
the true spectral gap |Ip of the infinite tree as L approaches infinity. 

Proof. To derive the Dirichlet spectral gap for finite trees using the leaves as the boundary, 
we will solve a recurrence that arises from the tree structure and the standard eigenvalue 
equation 

CdX = Xx. (2) 

Let T be a d-regular tree of depth L + 1; the {L + l)st level is the boundary. We first 
consider eigenvectors x which have the same value at every node at the same depth within 
T; these eigenvectors are azimuthally symmetric. We can represent each such eigenvector 
X as a sequence of values (xq, xi, . . . , xl), where Xi is the uniform value at all nodes at 
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depth i, similar to the analysis of the infinite-tree spectral gap appearing in |7j. Using this 
eigenvector form for x in ([2]) leads to the recurrence: 

Xi - - ^^—^Xi+i = Xxi, 2 <i < L. (3) 

a a 

At the leaves of the tree, we have the Dirichlet boundary condition: 

XL+i = 0, (4) 
and at the root of the tree we have the boundary condition 

xo — xi = Xxo- (5) 
We can solve (|3j) using the characteristic equation: 

— .^-(l-A). + - = 0, 
whose roots can be written as 

-■-Tab'"" 

with 

2 

A = 1 — -V d — 1 cos a. (7) 

Since A has to be real, either the real or imaginary part of a must be zero. Substituting 
the first boundary condition Q yields a solution to ([s]) with the form 

x„ = ^(rr^-i-rr^-i). (8) 

for some constant A and ri 2 given in Q. Using ([5]), the condition for eigenvalues is 

tana d — 2 



tan(L + l)a d 



< a < vr. (9) 



Since tanhx/ tanh(L + l)x is positive for all real x, there are no imaginary solutions to 
Eq.Q. Therefore all the L + 1 solutions are real. From Eq.Q, the corresponding L + 1 
eigenvalues are all outside the infinite-tree spectral gap. 

We now consider eigenvectors which are zero at all nodes up to the /c'th level with 
L > k > 0. The eigenvector is non-zero at two daughters of some fc'th level node and 
the descendants thereof. We assume azimuthal symmetry inside both these two sectors. 
The eigenvalue condition for the parent node at the A;'th level forces the eigenvector to 
be opposite in the two sectors. Inside each sector, ([3]), ([6]), ([7]) ([4]) and ([s]) are still valid. 
However, ([s]) is replaced by the condition x^ = 0, from which sin(L + 1 — k)a = 0. There 
are L — k real solutions to this equation, corresponding to eigenvalues that lie outside the 
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infinite-tree spectral gap, each with degeneracy d^{d — 1). The total number of eigenvalues 
we have found so far is 



L-l 



d^+i - 1 
d-l 



L + I + ^d^(d - 1){L - k) 



(10) 



fc=0 



i.e. we have found all the eigenvalues. As L gets larger, the smallest a approaches 0, 
showing that the Dirichlet spectral gap converges to the spectral gap of the infinite tree 



This derivation shows that Dirichlet eigenvalues capture the expansion properties 
of trees much better than the traditional spectral gap which has been shown to approach 
zero for large finite trees. This behavior on trees suggests that Dirichlet eigenvalues are a 
good candidate for use in analyzing real-world networks. Such analysis appears in Section 



3 Spectrum of Rocketfuel Networks 

Our research is motivated by a series of datasets representing portions of network topologies 
using Rocketfuel [18j. Rocketfuel datasets are publicly- available, created using traceroute 
and other networking tools to determine portions of network topology corresponding to indi- 
vidual Internet service providers. Even though like most measured datasets, the Rocketfuel 
networks are not free of errors (see for example [19]), they provide valuable connectivity 
information at the IP-layer of service provider networks across the globe. Because the 
datasets were created in this manner, they represent only subsets of the vast Internet; it be- 
comes impossible to determine network topology at certain points. For example, corporate 
intranets, home networks, other ISP's, and network- address translation cannot be explored. 
The networks used range in size from 121 to 10,152 nodes. 

Because of the method of data collection, the Rocketfuel datasets contain many 
degree- 1 nodes that appear at the edge of the topology. In actuality, the network extends 
beyond this point, but the datasets are limited to one ISP at a time. As such, it makes sense 
to view these degree- 1 nodes as the boundary of a finite subset of a much larger network. 
Using this boundary definition, we compute the Dirichlet spectral of these graphs and 
compare with their standard counterparts, as shown in Table [T] and Figure [2j It is apparent 
that the Dirichlet spectral gaps are much larger than the traditional spectral gaps for all the 
networks, implying a much higher degree of expansion than one would traditionally obtain. 
The spectral gaps for a two-dimensional square Euclidean grid are also shown; the grid is 
known to be a poor expander, and accordingly even the Dirichlet spectral gap is very small. 

Figure [3] shows the same data, plotted as a function of the number of nodes in 
each network. We see that the traditional spectral gap keeps decreasing as N is increased, 
whereas the Dirichlet spectral gap does not. 

Since Figure [3] compares different networks, possibly with different properties, we 
confirm the result by computing the spectral gap for subgraphs of different sizes drawn 
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Dataset ID 


Nodes 


Edges 


Traditional spectral gap 


Dirichlet spectral gap 


1221 


2998 


3806 


0.00386 


0.07616 


1239 


8341 


14025 


0.01593 


0.03585 


1755 


605 


1035 


0.00896 


0.09585 


2914 


7102 


12291 


0.00118 


0.04621 


3257 


855 


1173 


0.01045 


0.04738 


3356 


3447 


9390 


0.00449 


0.05083 


3967 


895 


2070 


0.00799 


0.03365 


4755 


121 


228 


0.03570 


0.06300 


6461 


2720 


3824 


0.00639 


0.11036 


7018 


10152 


14319 


0.00029 


0.09531 


Grid 


10000 


19800 


0.00025 


0.00050 



Table 1: Structural and spectral properties of Rocketfuel datasets. 



from a single network. All the nodes that are within a distance r of the center of mass of a 
network are included in a subgraph, with r varying between 1 and the maximum possible 
value for the network. In Fig. [4] shows the results for the largest of the Rocketfuel networks, 
dataset 7018 containing over 10,000 nodes. For a subgraph of radius r, the boundary is 
defined as all the nodes which i) have edges connecting them to nodes in the graph that 
are outside the subgraph or ii) connect to the outside world, i.e. that have degree 1 in the 
full dataset. As in Fig. |3| in Fig. |4j the traditional spectral gap keeps decreasing as r is 
increased, but the Dirichlet spectral gap does not. 

4 Spectral Decomposition 

One important application of the eigendecomposition of a graph is spectral clustering or 
partitioning [15l|T7]. The problem is to group the nodes into partitions, clusters, or com- 
munities that are inherently well-connected within themselves, with sparser connections 
between clusters. This is closely related to finding bottlenecks; if a graph has a bottleneck, 
then a good partition is often found by dividing the graph at the bottleneck. See [16] for a 
general survey of graph clustering. 

It is often desirable for a network partition to be balanced, and finding bottlenecks 
near the core or center of mass of a network is often more useful than simply clipping small 
subsets of nodes near the boundary. But according to [12], using the Cheeger ratio as a 
metric on real-world data, the "best" cuts larger than a certain critical size are actually 
"bags of whiskers" or combinations of numerous smaller cuts. Because many graph cluster- 
ing algorithms, including spectral clustering, try to optimize for this metric, the resulting 
partitions often slice numerous smaller cuts off the graph, which is not always useful. For 
our Rocketfuel data, we know that the boundary of the network is imposed by the method 
of data collection. Thus, by eliminating the boundary from graph clustering, we can more 
easily find partitions that are more evenly balanced, and bottlenecks that are closer to the 
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Figure 2: Comparison of traditional and Dirichlet spectral gaps in Rocketfuel data as well 
as the 2-dimensional Euclidean grid. 



core of the network. 

To do this, we use standard spectral clustering techniques from [15], but instead 
of using the normalized graph Laplacian C, we use the truncated Dirichlet version Cd- 
The eigenvectors used for clustering will therefore not include components for the degree- 1 
boundary nodes, but we can assign them to the same side of the partition as their non- 
boundary neighbor nodes. Specifically, we compute the first two eigenvectors oi Cd and 
cluster the nodes based on their components in these eigenvectors using /c-means. For each 
node, we compute the distance to both centers and sort the nodes based on the difference. 
For a partition of size k, we take the top k nodes. 

We follow the experiments of Leskovec et al. in [12] by using both traditional spectral 
clustering and Dirichlet spectral clustering to find cuts of different sizes. Specifically, we find 
Dirichlet cuts of all possible sizes, and then we find cuts using traditional spectral clustering 
for those same sizes after adding boundary nodes back in. Thus, for each network of A'^ 
nodes, we calculate N — B cuts, where B is the number of boundary nodes. 

For each cut, we measure the Cheeger ratio h and the number of components c. 
Ideally, a logical cut would split the network into exactly c = 2 components, but as Leskovec 
et al. demonstrated, as cut size increases, spectral clustering and other algorithms that 
optimize for h yield cuts with many components. This is precisely the problem we are 
trying to avoid using Dirichlet clustering, and our results show that Dirichlet clustering is 
effective in finding cuts with fewer components. Furthermore, even though our algorithm 
is not specifically optimizing for h, it does not find cuts that have significantly worse values 
for h while finding cuts with far fewer components. 

We outline some aggregate data in Table [2] For several datasets, we count the 
number of cuts in four different categories, comparing the Dirichlet Cheeger ratio and 
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Figure 3: Comparison of traditional and Dirichiet spectral gaps across Rocketfuel networks. 



number of components {hij and co) with traditional spectral clustering (/ly and cy). It is 
evident that Dirichiet clustering finds cuts with fewer components than traditional spectral 
clustering {cd < ct) for most cut sizes, indicating that while spectral clustering optimizes 
for Cheeger ratio, it often "cheats" by collecting whiskers as one cut. In addition, despite 
the use of Cheeger ratio optimization, Dirichiet clustering sometimes finds cuts with better 
Cheeger ratio as well. In the last two columns for each dataset, we give the difference in h 
and c averaged out over all cut sizes. It turns out that the Cheeger ratios, on average, are 
not drastically different between the two methods, and Dirichiet clustering gives cuts with 
far fewer components. 

Along with our aggregate data, we illustrate each individual cut for several of our 
Rocketfuel datasets in Fig. [5| (A few of the datasets were too large for accurate numerical 
computation.) For each cut size, we plot a point corresponding to the difference in Cheeger 
ratio h and the number of components c between Dirichiet and traditional spectral cluster- 
ing. It should be clear that for the majority of cut sizes, Dirichiet clustering finds cuts with 
far fewer components, but there is generally little change in Cheeger ratio. This can be 
seen in the large variation on the c-axis with much smaller discrepancies on the /i-axis. In 
other words, Dirichiet clustering avoids finding "bags of whiskers" while still maintaining 
good separation in terms of h, despite not explicitly optimizing for h. 

It is clear that using Dirichiet eigenvalues improves the partition by ignoring the 
boundary, alleviating the tendency to find "bags of whiskers" without drastically changing 
the Cheeger ratio. Although traditional spectral clustering does not always fail, there is 
clear evidence that Dirichiet spectral properties are an important tool in the analysis of 
real- world networks. 
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Figure 4: Comparison of traditional and Dirichlet spectral gaps in successively larger sub- 
graphs, grown from the center of mass of dataset 7018. 



5 Discussion 

Our results show evidence that eigenvalues of the graph Laplacian can provide rich infor- 
mation about real-world networks when Dirichlet boundary conditions are applied. We find 
that the Dirichlet spectral gap computed for several IP-layer networks is much larger than 
the traditional spectral gap, and is likely to go to a finite limit as the size of the network 
is increased. Rigorous analysis for infinite d-regular trees suggests that this may be the 
same as the spectral gap of an infinite communications network. Spectral clustering using 
Dirichlet eigenvalues yields much better clustering than traditional methods. 

The spectral decomposition using Dirichlet eigenvalues also suggests a connection 
to large-scale negative curvature [lOl [TTl [T3] in the Rocketfuel data. Traditional negatively 
curved graphs such as trees and hyperbolic grids generally exhibit poor connectivity and core 
congestion. Standard clustering often yields combinations of smaller cuts near the periphery 
of the graph, but using Dirichlet clustering, we can see that there tend to be bad larger- 
scale cuts as well in the Rocketfuel datasets, in the graph interior. The presence of these 
larger-scale cuts is a hallmark of negative curvature or hyperbolicity fS], suggesting that 
Dirichlet spectral clustering may yield different behavior for hyperbolic and flat networks. 
The hyperbolic grids themselves are also suitable for further analysis, building from our 
study of regular trees. Many properties such as the spectral gap remain open questions. 

With some evidence of a connection between global negative curvature, the spectral 
gap, and expansion, it would be interesting to empirically compare the hyperbolicity 6, the 
Cheeger constant h, and the traditional and Dirichlet spectral gaps of Rocketfuel and other 
real-world networks as well as well-known network models. From this, it could be possible 
to classify various networks based on these properties. 



11 





Number of cuts in each catej 


?ory: 












CD < Ct 


CD < Ct 


CD > Ct 




CD > Ct 


Avg 


Avg 


Avg 


Avg 


Dataset 


h-D < hT 


Hd > hT 


Hd < h.T 




Hd > flT 


CD - CT 


hD - flT 


Ct 


Ht 


1221 


49 


197 







6 


-28.9 


0.0506 


36.8 


0.0829 


1239 


538 


362 







30 


-75.1 


0.0127 


83.4 


0.1326 


1755 


32 


91 







14 


-4.5 


0.0545 


7.9 


0.1210 


2914 


224 


819 







323 


-107.3 


0.0565 


125.8 


0.1639 


3257 


49 


67 







35 


-12.3 


0.0370 


20.0 


0.1386 


3356 


182 


315 


3 




41 


-34.6 


0.0388 


45.6 


0.1895 


3967 


24 


137 


3 




129 


-3.2 


0.1423 


9.2 


0.1215 


4755 


15 


6 







6 


-12.3 


-0.0970 


15.4 


0.3460 


6461 


111 


199 







73 


-13.4 


0.0148 


19.7 


0.0999 


7018 


157 


465 


12 




273 


-54.3 


0.0403 


81.4 


0.0735 



Table 2: Aggregate data comparing Dirichlet spectral clustering with traditional spectral 
clustering for several Rocketfuel datasets. For each dataset, we compute Dirichlet cuts of all 
possible sizes, and compare them with traditional spectral cuts with the same sizes. Smaller 
values of c and h arc better. We classify the cuts into four categories, counting the number 
in each, and we also give the average difference in h and c between Dirichlet and traditional 
spectral clustering. The data shows that Dirichlet clustering finds cuts with many fewer 
components without significant adverse effects on the Cheeger ratio. 
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Figure 5: Comparison of Cheeger ratio h and number of components c for cuts for various 
datasets using Dirichlet (D) and traditional (T) spectral clustering. Each point represents 
one possible cut size; in general, Dirichlet clustering yields many fewer components without 
sacrificing much in Cheeger ratio. 
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